In this vignette, we explore how to label your track files (activity and pressure) and provide tips to make the exercise more efficient. To see where this exercise fits in with the overall process, see the section “Editing on TRAINSET” of basic example
Motivation
The most important reason motivating manual editing is that pressure mapping relies on precise activity and pressure data. Activity labeling defines stationary periods and flight duration. Short stationary periods can be particularly hard to define, such that expert knowledge is essential. Since flight duration impacts the movement model, having an accurate flight duration is critical to correctly estimate the bird’s position, especially for short stationary periods when pressure mapping is not precise. The pressure timeseries matching algorithm is highly sensitive to erroneously labeled pressure, such that even a single mislabeled datapoint can throw off the estimation map.
Each species’ timeseries is so specific that manual editing remains the fastest option. You can expect to spend between 30sec (e.g. Mangrove Kingfisher) to 10min (e.g. Eurasian Nightjar) per track depending on the species’ migrating complexity.
Manual editing also provides a sense of what the bird is doing. You’ll get to know how the bird is moving (e.g. long continuous high altitude flight, short flights over multiple days, alternation between short migration flights and stopovers, etc.). It also provides a sense of the uncertainty of your classification, which is useful to understand and interpret the results of the track path modeled.
That being said, it is worth starting the manual editing process with automatically labeled timeseries. Refer to possible classification methods on the PALMr manual.
Basic labeling principles
The exercise involves labeling migratory activity as 1 in the acceleration column and identifying pressure points to be discarded from the matching exercise with 1 in the pressure column. However, there are a lot of small loopholes which can make this task difficult.
The outcome of the activity labeling is twofold: (1) defined stationary periods, during which the bird is considered static relative to the size of the grid (~10-30km). The start and end of the stationary period is used to define the pressure timeseries to be matched. (2) defined flight durations, which is used in the movement model to define the distance between stationary periods.
##Intro to the tool used for manual editing: TRAINSET

TRAINSET (www.trainset.geocene.com) is “a lightweight web application for brushing labels onto time series data; useful for building training sets.”
You can read more about TRAINSET on www.trainset.geocene.com and https://github.com/Geocene/trainset.
The tool interface is quite intuitive. Start by uploading your .csv file (e.g., 18IC_act_pres.csv).

View after uploading a file
A few tips: - Keyboard shortcuts considerably speed up navigation (zoom in/out, move left/right) and labeling (add/remove a label). - Because of the number of datapoints (depending on the resolution of the track), keeping a narrow temporal window will avoid your browser from becoming slow or irresponsive. - You can change the “Reference Series” to pressure to see both timeseries at the same time which is helps interpret what the bird is doing. - Play with the y-axis range to properly see small pressure variations, and, importantly, spot outliars which may not be visible at full range (e.g., 500-1500 hPa). - TRAINSET is offers more flexibility with the label than required: you can add and remove label values (bottom-right of the page). In order for trainset_read() to work, do not change/edit/add any label, simply use the ones offered : “TRUE” and FALSE".
##Three tests to check labeling
To assess the quality of your labeling, you can use this script comprising of three basic tests.
Refer to the basic example vignette to read background information on the processing steps before labeling.
pam_data = pam_read(pathname = system.file("extdata", package = "GeoPressureR"),
crop_start = "2017-06-20", crop_end = "2018-05-02")Test 1: Duration of stationary periods and flights
A first test to check your labeling is accurate is to compute the durations of flights and stationary periods. In this example, I use the first version exported from TRAINSET:
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v1.csv")
pam_data = pam_sta(pam_data)
knitr::kable(pam_data$sta[difftime(pam_data$sta$end,pam_data$sta$start, units = "mins")<60 | pam_data$sta$next_flight_duration<30,])| start | end | duration | next_flight_duration | sta_id | |
|---|---|---|---|---|---|
| 7 | 2017-08-30 23:45:00 | 2017-08-30 23:55:00 | 10 mins | 255 mins | 7 |
| 27 | 2018-04-15 19:30:00 | 2018-04-15 20:10:00 | 40 mins | 85 mins | 27 |
| 30 | 2018-04-29 23:35:00 | 2018-04-29 23:45:00 | 10 mins | 170 mins | 30 |
| 32 | 2018-04-30 19:20:00 | 2018-04-30 19:40:00 | 20 mins | 125 mins | 32 |
| 33 | 2018-04-30 21:45:00 | 2018-04-30 21:55:00 | 10 mins | 65 mins | 33 |
| 34 | 2018-04-30 23:00:00 | 2018-04-30 23:10:00 | 10 mins | 50 mins | 34 |
| 35 | 2018-05-01 00:00:00 | 2018-05-01 00:10:00 | 10 mins | 35 mins | 35 |
| 36 | 2018-05-01 00:45:00 | 2018-05-01 23:30:00 | 1365 mins | 0 mins | 36 |
You may want to discard stationary periods and flights that are shorter than a couple of hours. Using the exact times from the table above, you can edit your labeling in TRAINSET and export a new version of the csv file. Note that the last row has a next_flight_duration of 0 because it is the last stationary period.
Test 2: Pressure timeseries
The second check to carry out before computing the map is to visualize the pressure timeseries and their grouping into stationary periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v2.csv")
pam_data = pam_sta(pam_data)
p <- ggplot() +
geom_point(data = pam_data$pressure, aes(x=date,y=obs),col="grey",size=0.1) +
geom_line(data = subset(pam_data$pressure, sta_id != 0),
aes(x=date,y=obs,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
#scale_colour_brewer(type="qualitative", palette = 'Set1')
ggplotly(p, dynamicTicks = T)Ploting this figure with PlotlyR allows you to zoom-in and pan to check all timeseries are correctly grouped. Make sure each stationary period does not include any pressure measurement from flight. You might spot some anomalies in the temporal variation of pressure. In some cases, you can already label the pressure timeseries to remove them. You can then export your file as a new version.
Test 3: Pressure match for long stationary periods
So far we have checked that the pressure timeseries are correctly labeled with their respective stationary periods, and that they look relatively smooth. The third test consists of finding the location with the best match and comparing the pressure timeseries. This allows to distinguish bird movements from natural variations of the pressure.
We recommend starting with the long stationary periods, and once results are satisfying, moving to the shorter periods.
pam_data = trainset_read(pam_data, system.file("extdata", package = "GeoPressureR"), filename = "18LX_act_pres-labeled-v3.csv")
pam_data = pam_sta(pam_data)
sta_id_keep = pam_data$sta$sta_id[difftime(pam_data$sta$end,pam_data$sta$start, units = "hours")>12]
pam_data$pressure$sta_id[!(pam_data$pressure$sta_id %in% sta_id_keep)] = NA
message("Number of stationay period to query: ",length(sta_id_keep))We can query the data.
raster_list = geopressure_map(pam_data$pressure, extent=c(-16,20,0,50), scale=10, max_sample=100)
prob_map_list = geopressure_prob_map(raster_list)For each stationary period, we locate the best match and query the pressure timeseries at this location. If you get errors, check the probability map and the best match (see commented line starting with leadlet())
ts_list=list()
for (i_r in 1:length(prob_map_list)){
i_s = metadata(prob_map_list[[i_r]])$sta_id
# find the max value of probability
tmp = as.data.frame(prob_map_list[[i_r]],xy=T)
lon = tmp$x[which.max(tmp[,3])]
lat = tmp$y[which.max(tmp[,3])]
# filter pressure for the stationary period and include flight period before and after
id = pam_data$pressure$sta_id==i_s & !is.na(pam_data$pressure$sta_id)
# Visual check
# leaflet() %>% addTiles() %>% addRasterImage(prob_map_list[[i_r]]) %>% addMarkers(lat=lat,lng=lon)
# query the pressure at this location
message("query:",i_r,"/",length(sta_id_keep))
ts_list[[i_r]] = geopressure_ts(lon,
lat,
pressure = list(
obs = pam_data$pressure$obs[id],
date = pam_data$pressure$date[id]
))
# Add sta_id
ts_list[[i_r]]['sta_id'] = i_s
# Remove mean
ts_list[[i_r]]$pressure0 = ts_list[[i_r]]$pressure - mean(ts_list[[i_r]]$pressure) + mean(pam_data$pressure$obs[id])
}
# Save the data for vignette
# usethis::use_data(ts_list,prob_map_list)We can now look at a similar figure of pressure timeseries, but this time comparing the geolocator data with the best match from the reanalysis data.
p <- ggplot() +
geom_point(data=as.data.frame(pam_data$pressure), aes(x=date,y=obs), colour="grey",size=0.5) +
geom_line(data=do.call("rbind", ts_list), aes(x=date,y=pressure0,col=as.factor(sta_id))) +
theme_bw() +
scale_colour_manual(values=rep(RColorBrewer::brewer.pal(9,"Set1"),times=4))
ggplotly(p, dynamicTicks = T)#%>% layout(legend = list(orientation = "h", x = -0.5))You can use this figure and the accelerometer data on TRAINSET to establish when the bird moves and when it changes location vs just flying to another elevation in the same area. If it moves to a new location, it should be a new stationary period. If not, it might be more appropriate to label all the pressure elevations that you wish to discard.
Common challenges and tips to address them

Missing a single datapoint in acceleration will create a staionary period and split the flight in two. This happens regularly with the KNN classfier.

Accurate classification of flight duration can be difficult when the bird migrates with less intensity at the end the flight.

Defining exaclty stationary period for some species can be difficult (here Tawny Pipit) with activities before 6am which could be low-intensity migratory movement or long non-migratory acitivity (feeding), or anything in between!
There will be situation were certain classification of activity is not possible. It is worth reminding that the labelisation of activity is two-fold: - Define flight duration, which will be used in the movement model and ultimately have the strongest impact on (1) the estimation of the position of short stopver between long flight (i.e, how ) and (2) estimation of fight speed when the position of the bird is well constrain. Ultimately a few datapoint more or less won’t have strong impact on long flight. But estimation of short movement can be relatively tricky. To partially accommodate for this, we compute an effort_duration for each flight, which normalize the duration of migratory flight by the intensity of the activity over the entire journey of the bird. - Define stationary period, which will be used to in the pressure timeserie matching.
At this stage, it is very useful to add pressure timeserie to understand the implication of defining stationary period on pressure timeseries

Although this Red-capped Robin Chat were not too active during this moring, you can notice drop of pressure after 9PM while similar level of activity before 9am on the next day don’t affect the pressure time series.
I think it’s best to think of stationary period, as period were the pressure timeserie is continuous enough to be able to match on the map.
A balence need to the found between creating enough stationary period to account for all position of the bird able to be estimated and creating too many stationary period, where you loose the duration of the timeserie able to match. This is important, because we are looking in creating long timeserie of pressure containing sufficient temporal varation, but not variation which are due to local/short movement (often because of latitude varition.
So, one option we have is to label activity to create new stationary period. The other option to avoid having to create too many stationary period is to label pressure time serie as outliar. These datapoint won’t be used in the match of the timeserie.

Here is a possible way to handle the Red-capped Robin-chat example. Tightening the pressure y axis while increase the time x axis allows to better see the generally smooth natural temporal varation of pressure that we want to capture. The fine-scale temporal variation of pressure can then be attributated to bird local movement (e.g. foraging in area with topographical varation). My propostion here is to create a new stationay period of a couple of hours and then mark pressure variation up to the 13th as outliar due to too much variation as well as the varation around 9am.
Future improvements
A lot can be do to improve this process: - Run trainset offline. - By-pass the create csv, uplad csv, read csv by runing a browser session directly in R - Building a R (shiny) equivalent of Trainset to be directly integrated with the R package. Problem: can’t find a good package to label point in a figure in R, would have to maintain it while trainset it doing that for free. - Any suggestions? Write an issue on Gitub